Block v2: Paged indexes/data #577

joe-elliott · 2021-03-08T20:42:49Z

What this PR does:
Creates a v2 block that accesses its index by page instead of as a whole. This will allow us to grow our index sizes signficantly.

Generally, I think this PR is solid, but I do have some concerns about the .Find() method on the v2.indexReader. When I first started this I thought we were going to use the "fact" that ids should be roughly linearly distributed from minID to maxID in the block. I was intending to estimate the first page based on the requested ID and check that page and go from there. However, trace ids are either 64 bit OR 128 bit. In a scenario where a block had a single 128 bit id it would skew the linearity assumption so badly, that I felt the consistency of binary search was preferred (even if it underperforms in a lot of cases).

Adjusts the default index downsample bytes to 1MB. This will double the size of our indexes.
Sets the default size of index page to 250KB. This implies roughly 10 pages on our largest blocks.
A side effect of moving index access from .Read() to .ReadRange() is that we are no longer caching indexes. I am fine with this for now, but we will need to revisit soon. If anything the index pages should make them far more cacheable then before.
Added 2 fields to the meta needed to read the block: IndexPageSize and TotalRecords. I am unexcited about further bloat in the meta.json file, but I don't know where else to put this.
Wrapped the data in the same page object. This allows parsing of a data file even without an index.
Renamed PageReader/Writer => DataReader/Writer for clarity

Which issue(s) this PR fixes:
Fixes #32

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>

tempodb/encoding/v2/index_reader.go

Signed-off-by: Joe Elliott <[email protected]>

annanay25

LGTM.

joe-elliott added 24 commits March 3, 2021 14:09

Added v2

6de5821

Signed-off-by: Joe Elliott <[email protected]>

Added v2 page reader

585c268

Signed-off-by: Joe Elliott <[email protected]>

simplified bytes counting test

b334dce

Signed-off-by: Joe Elliott <[email protected]>

possibly cleaned up or made pagereaders more complicated

ba1842f

Signed-off-by: Joe Elliott <[email protected]>

paged index writer/reader:at

5feac5e

Signed-off-by: Joe Elliott <[email protected]>

Cleaned up tests

c0376a2

Signed-off-by: Joe Elliott <[email protected]>

moved 'base' objects to a 'base' folder

647faf3

Signed-off-by: Joe Elliott <[email protected]>

tests cleanup

124ff51

Signed-off-by: Joe Elliott <[email protected]>

clean up

2ebca6a

Signed-off-by: Joe Elliott <[email protected]>

Added header logic

c17df00

Signed-off-by: Joe Elliott <[email protected]>

added page header

9f39296

Signed-off-by: Joe Elliott <[email protected]>

Added min/max ids and checksums to index header

26d4f87

Signed-off-by: Joe Elliott <[email protected]>

cleanup

5ebab60

Signed-off-by: Joe Elliott <[email protected]>

note

d8f84e4

Signed-off-by: Joe Elliott <[email protected]>

Merge branch 'master' into v2-for-real

1b9263a

Added binary record search

6ab23dc

Signed-off-by: Joe Elliott <[email protected]>

cleanup

efa6db0

Signed-off-by: Joe Elliott <[email protected]>

switched to xxhash

ed5cf77

Signed-off-by: Joe Elliott <[email protected]>

removed min/max ids

ad0ddcf

Signed-off-by: Joe Elliott <[email protected]>

Merge branch 'master' into v2-for-real

401145e

all tests pass

ee94f9e

Signed-off-by: Joe Elliott <[email protected]>

Added require

8c3cdd4

Signed-off-by: Joe Elliott <[email protected]>

Added/adjusted defaults

b455070

Signed-off-by: Joe Elliott <[email protected]>

lint/casting

04022e8

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott requested review from annanay25, dgzlopes and mdisibio as code owners March 8, 2021 20:42

joe-elliott added 3 commits March 8, 2021 15:44

changelog

7e90318

Signed-off-by: Joe Elliott <[email protected]>

go mod

1a90dd8

Signed-off-by: Joe Elliott <[email protected]>

pageReader/Writer => dataReader/Writer

d5b7728

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott commented Mar 9, 2021

View reviewed changes

tempodb/encoding/v2/index_reader.go Outdated Show resolved Hide resolved

Added search with errors

2448339

Signed-off-by: Joe Elliott <[email protected]>

annanay25 approved these changes Mar 10, 2021

View reviewed changes

mdisibio approved these changes Mar 10, 2021

View reviewed changes

joe-elliott merged commit fc4f477 into grafana:master Mar 10, 2021

joe-elliott mentioned this pull request Mar 11, 2021

Cache the first N cache index page requests #587

Closed

This was referenced Mar 12, 2021

Bump tempo from a8bf220 to cbd6102 querycap/tempo#41

Closed

Bump tempo from a8bf220 to 0712134 querycap/tempo#42

Closed

Bump tempo from a8bf220 to 8c84b34 querycap/tempo#43

Closed

Bump tempo from a8bf220 to f7093f5 querycap/tempo#44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block v2: Paged indexes/data #577

Block v2: Paged indexes/data #577

joe-elliott commented Mar 8, 2021 •

edited

Loading

annanay25 left a comment

Block v2: Paged indexes/data #577

Block v2: Paged indexes/data #577

Conversation

joe-elliott commented Mar 8, 2021 • edited Loading

annanay25 left a comment

Choose a reason for hiding this comment

joe-elliott commented Mar 8, 2021 •

edited

Loading